Tradeoffs for nearest neighbors on the sphere
نویسنده
چکیده
We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the spherical filters recently introduced by [Becker–Ducas–Gama– Laarhoven, SODA’16] to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity nq and update complexity nu for data sets of size n can be summarized by the following equation in terms of the approximation factor c and the exponents ρq and ρu: c √ ρq + (c 2 − 1)ρu = √ 2c2 − 1. For small c = 1 + ε, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity of approximately n1−4ε 2 . Balancing the query and update costs leads to optimal complexities of n 2−1), matching lower bounds from [Andoni–Razenshteyn, 2015] and [Dubiner, IEEE Trans. Inf. Theory 2010] and matching the asymptotic complexities previously obtained by [Andoni– Razenshteyn, STOC’15] and [Andoni–Indyk–Laarhoven–Razenshteyn–Schmidt, NIPS’15]. A subpolynomial query time complexity n can be achieved at the cost of a space complexity of the order n 2), matching the lower bound n ) of [Andoni–Indyk–Pǎtraşcu, FOCS’06] and [Panigrahy–Talwar–Wieder, FOCS’10] and improving upon results of [Indyk–Motwani, STOC’98] and [Kushilevitz–Ostrovsky–Rabani, STOC’98] with a considerably smaller leading constant in the exponent. For large c, minimizing the update complexity results in a query complexity of n 2+O(1/c4), improving upon the related asymptotic exponent for large c of [Kapralov, PODS’15] by a factor 2, and matching the lower bound n ) of [Panigrahy–Talwar–Wieder, FOCS’08]. Balancing the costs leads to optimal complexities of the order n 2−1), while a minimum query time complexity can be achieved with update and space complexities of approximately n +O(1/c) and n 2+O(1/c4), also improving upon the previous best exponents of Kapralov by a factor 2 for large n and c. For the regime where n is exponential in the dimension, we obtain further improvements compared to results obtained with locality-sensitive hashing. We provide explicit expressions for the query and update complexities in terms of the approximation factor c and the chosen tradeoff, and we derive asymptotic results for the case of the highest possible density for random data sets.
منابع مشابه
A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملDistributional Similarity Models: Clustering vs. Nearest Neighbors
Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly compute averages of estimates for distributional neighbors of a target event. Here, we examine the tradeoffs between model size and prediction accuracy for cluster-based and nearest neighbors distribution...
متن کاملQuery Sphere Indexing for Neighborhood Requests
This is an algorithm for finding neighbors for point objects that can freely move and have no predefined position. The query sphere consists of a center location and a given radius within which nearby objects must be found. Space is discretized in cubic cells. This algorithm introduces an indexing scheme that gives the list of all the cells making up the query sphere, for any radius and any cen...
متن کاملA New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection
Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...
متن کاملA comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater
The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...
متن کاملThe Performance of small samples in quantifying structure central Zagros forests utilizing the indexes based on the nearest neighbors
Abstract Todaychr('39')s forest structure issue has converted to one of the main ecological debates in forest science. Determination of forest structure characteristics is necessary to investigate stands changing process, for silviculture interventions and revival operations planning. In order to investigate structure of the part of Ghale-Gol forests in Khorramabad, a set of indices such as Cla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1511.07527 شماره
صفحات -
تاریخ انتشار 2015